Optimal discrete controller synthesis for the modeling of fault-tolerant distributed systems
نویسندگان
چکیده
Embedded systems require safe design methods based on formal methods, as well as safe execution based on fault-tolerance techniques. We propose a safe design method for safe execution systems: it uses optimal discrete controller synthesis (DCS) to generate a correct reconfiguring fault-tolerant system. The properties enforced concern consistent execution, functionality fulfillment (whatever the faults, under some failure hypothesis), and several optimizations, particularly on the execution time when going through checkpoints. We propose an algorithm for optimal DCS on bounded paths. We propose model patterns for a set of periodic tasks with checkpoints, a set of distributed, heterogeneous and fail-silent processors, and an environment model that expresses the potential fault patterns. We use synchronous models, the Sigali symbolic DCS tool and Mode Automata. Key-words: Real-time systems, safe design, fault tolerance, optimal discrete control synthesis, synchronous systems. ∗ INSA Lyon, http://www.insa-lyon.fr , [email protected] † INRIA Rhône-Alpes, POP ART, http://pop-art.inrialpes.fr/people/girault , [email protected] ‡ IRISA/INRIA-Rennes, Vertecs, http://www.irisa.fr/prive/hmarchan , [email protected] § INRIA Rhône-Alpes, POP ART, http://pop-art.inrialpes.fr/people/rutten , [email protected] in ria -0 01 34 55 0, v er si on 2 6 M ar 2 00 7 Synthèse de contrôleurs discrets optimale pour la modélisation de systèmes distribués tolérants aux fautes Résumé : Les systèmes embarqués requièrent des méthodes de conception sûres fondées sur des méthodes formelles, ainsi qu’une exécution sûre fondée sur des techniques de tolérance aux fautes. Nous proposons une méthode de conception sûre pour des sysèmes à l’exécution sûre : elle utilise la synthèse de contrôleurs discrets pour générer un système tolérant aux fautes reconfigurable correct. Les propriétés assurées concernent l’exécution consistente, le remplissage de la fonctionnalité (quelles que soient les fautes, sous une certaine hypothèse de fautes), et plusieurs optimisations, notamment sur le temps des exà cutions passant par des points de reprise. Nous proposons un algorithme de synthèse de contrôleurs discrets optimale sur des chemins bornés. Nous proposons des patrons de modèles pour un ensemble de tà ches périodiques avec points de reprise, un ensemble de processeurs distribués, hétérogènes et silencieux sur défaillance, ainsi qu’un modèle de l’environnement qui exprime les patrons de fautes potentiels. Nous utilisons des modà ̈les synchrones, l’outil de SCD symbolique Sigali et les Automates de Modes. Mots-clés : Systèmes temps-réel, conception sûre, tolérance aux fautes, synthèse de contrôleurs discrets, programmation synchrone. in ria -0 01 34 55 0, v er si on 2 6 M ar 2 00 7 Optimal discrete controller synthesis for fault-tolerant distributed systems 3 1 Motivation The motivation of this work is to propose a methodology based on discrete controller synthesis, with optimal synthesis on bounded paths, in order to model, design, and optimize fault-tolerant distributed systems. 1.1 Safety critical embedded systems Embedded systems account for a major part of critical applications (space, aeronautics, nuclear. . .) as well as public domain applications (automotive, consumer electronics. . .). Their main features are: duality automatic-control/discrete-event : they include control laws modeled as differential equations in sampled time, computed iteratively, and discrete event systems to sequence the control laws according to mode switches; critical real-time: unmet timing constraints may involve a system failure leading to a disaster; limited resources : they rely on limited computing power and memory because of weight and encumbrance, power consumption (autonomous vehicles or portable devices), radiation resistance (nuclear or space), or price constraints (consumer electronics); distributed and heterogeneous architecture: they are often distributed to provide enough computing power to keep computing sites close to the sensors and actuators, and to allow fault-tolerance. 1.2 Problem statement An embedded system being intrinsically critical, it is essential to insure that it is tolerant to processor failures. This can even motivate its distribution itself. In such a case, at the very least, the loss of one computing site must not lead to the loss of the whole application. We are interested in formal methods to model systems with guarantees on their faulttolerance capabilities. Among the various existing formal methods, we investigate the use of discrete controller synthesis (DCS). The advantages of using DCS are the correctness of the resulting system and the easy modifiability of the controller (thanks to automatic tools), i.e., the possibility to study and test several fault-tolerance objectives or failure hypotheses on the same system model, without the need to re-design the system. Specifically, our objective is: To produce automatically a controller enforcing fault-tolerance for a given distributed system. Fault-tolerance is the faculty to maintain functionality of a system, whatever the faults under some failure hypothesis. To achieve this, we will need first to model our distributed systems, RR n 6137 in ria -0 01 34 55 0, v er si on 2 6 M ar 2 00 7 4 E. Dumitrescu, A. Girault, H. Marchand & É. Rutten and second to express formally some fault-tolerance objective, in terms of events and states of the system. We propose to designers a methodology for modeling a system and studying the existence of fault-tolerant solutions according to several failure hypotheses and system’s configurations. When a solution is found, it can be used either as a guideline for implementation (if the model was an abstract one [12]) or for deployment with a dynamic failure reconfiguring feature (this paper). In our approach, a system consists of a set of real-time periodic tasks placed in a configuration onto a set of processors. Each task has a known execution cost and quality on each processor. Upon the occurrence of a fault, one or several processors become unusable, and tasks must be placed anew in another configuration, by migrating them onto another processor, so that execution can proceed. These reconfigurations of the system have to be controlled according to a fault-tolerance policy, enforced by a task manager. The latter is specified in terms of properties concerning placement constraints, reachability of termination, and optimization of costs and qualities. 1.3 Contributions We propose to automatically produce the task manager with DCS techniques, applied to a model of the system in all its possible configurations. This model will consist of several components, each modeled as a labeled transition system (LTS), and composed in parallel; DCS will produce a property-enforcing layer on top of the components [1]. We extend previous results [13] by considering tasks with checkpoints, and using optimal DCS along paths. We design and implement an algorithm for optimal DCS on bounded paths reaching a target configuration, where we introduce the possibility of optimizing systems containing 0-cost wait states. To the best of our knowledge, this feature is not available in classical optimal synthesis approaches. Yet, it is very useful in reactive systems where some states correspond to waiting for input events. The technical context of our work is the synchronous approach for the design of reactive systems [5]. This choice is motivated by the existence of a corpus of available results (programming languages, compilers, formal tools) and technologies, which already have an industrial impact. Our method is based on synchronous models, and this influences some of our choices in the design of the LTSs and on the parallel composition, as well as in already existing DCS models and tools [22] which we extended with the optimal DCS algorithm for bounded paths. 1.4 An introductory example In order to motivate concretely the contributions in the following, we will use the example of a task on an architecture with two processors. It is initially idle, and upon a request r http://www.synalp.org INRIA in ria -0 01 34 55 0, v er si on 2 6 M ar 2 00 7 Optimal discrete controller synthesis for fault-tolerant distributed systems 5 goes into a ready state. From there, it can be started on either processor; the choice is given through two exclusive events: a1 for processor 1, and a2 for processor 2. The architecture can be heterogeneous, and the performance of the task can be variable on the different processors, in terms of computations time, energy consumption, quality of service, ... The task has two phases: A, followed by B; between the two phases, there is a checkpoint event c. Upon reception of a second event c in phase B, the task terminates. We will be modeling and controlling the configurations of this task, executing on this architecture, in reaction to faults consisting of processor failures. In this case, the task will be migrated and executed on the remaining safe processor. This can happen along the duration of the task, in phase A or in phase B. When the migration occurs after the checkpoint, the task is started in the second phase, and not from the beginning. We want to model all these possible evolutions of a system in order to compute a controller (typically: deciding upon events a1 and a2) that, for all possible evolutions of the architecture, will keep the task running (i.e., not being assigned to a faulty processor) up to completion (i.e., reaching termination). For this, we build a model of all the configurations of the task, and the transitions it can make between them. We do this in terms of labeled transition systems (LTS) as shown in Figure 1.
منابع مشابه
Optimal Discrete Controller Synthesis for Modeling Fault-tolerant Distributed Systems
We propose a safe design method for safe execution systems, based on faulttolerance techniques: it uses optimal discrete controller synthesis (DCS) to generate a correct-by-construction fault-tolerant system. The properties enforced concern consistent execution, functionality fulfillment (whatever the faults, under some failure hypothesis), and several optimizations (of the tasks’ execution tim...
متن کاملMulticriteria optimal reconfiguration of fault-tolerant real-time tasks
We propose a technique for discrete controller synthesis, with optimal synthesis on bounded paths, in order to model, design, and optimize fault-tolerant distributed systems, taking into account several criteria (e.g., the execution costs of the tasks and their quality of service). Different combinations are explored for multi-criteria optimization.
متن کاملModeling Fault-tolerant Distributed Systems for Discrete Controller Synthesis
Embedded systems require safe design methods based on formal methods, as well as safe execution based on fault-tolerance techniques. We propose a safe design method for safe execution systems: it uses discrete controller synthesis (DCS) to generate a correct reconfiguring system. The properties enforced concern consistent execution, functionality fulfillment (whatever the faults, under some fai...
متن کاملOptimal nonlinear control of flight faults in manned aircrafts in the presence of fault and failure of control actuato
Control actuators' faults are among the major reasons to lose aircraft control while flights. The plane dynamics is severely dependent upon faults and errors in flight control systems and if the reformed control order is not issued by the fault tolerant controller there would be unpleasant outcomes such as inconsistency and the reduction of system performance and some dreadful aerial accide...
متن کاملOptimal Finite-time Control of Positive Linear Discrete-time Systems
This paper considers solving optimization problem for linear discrete time systems such that closed-loop discrete-time system is positive (i.e., all of its state variables have non-negative values) and also finite-time stable. For this purpose, by considering a quadratic cost function, an optimal controller is designed such that in addition to minimizing the cost function, the positivity proper...
متن کامل